In-memory URL Compression using AVL Tree

نویسندگان

  • Kasom Koht-arsa
  • Surasak Sanguanpong
چکیده

A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression algorithm is based on a delta encoding scheme to extract URLs sharing common prefixes and an AVL tree to get efficient search speed. Our results show that the 50% of size reduction is achieved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-memory URL Compression

A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression alg...

متن کامل

New Combinatorial Properties and Algorithms for AVL Trees

In this thesis, new properties of AVL trees and a new partitioning of binary search trees named core partitioning scheme are discussed, this scheme is applied to three binary search trees namely AVL trees, weight-balanced trees, and plain binary search trees. We introduce the core partitioning scheme, which maintains a balanced search tree as a dynamic collection of complete balanced binary tre...

متن کامل

Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees

In this paper we combine Hardware Transactional Memory (HTM) with Read-Copy-Update (RCU) to implement highly scalable concurrent balanced Binary Search Trees (BSTs). The two key features of our approach are: a) read-only operations require no synchronization or restarts and b) tree modifications are first performed in private copies of subtrees, then HTM is used to validate their consistency, a...

متن کامل

Cache-sensitive Memory Layout for Binary Trees

We improve the performance of main-memory binary search trees (including AVL and red-black trees) by applying cache-sensitive and cacheoblivious memory layouts. We relocate tree nodes in memory according to a multi-level cache hierarchy, also considering the conflict misses produced by set-associative caches. Moreover, we present a method to improve onelevel cache-sensitivity without increasing...

متن کامل

AVL Trees with Relaxed Balance

The idea of relaxed balance is to uncouple the rebalancing in search trees from the updating in order to speed up request processing in main-memory databases. In this paper, we describe a relaxed version of AVL trees. We prove that each update gives rise to at most a logarithmic number of rebalancing operations and that the number of rebalancing operations in the semidynamic case is amortized c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003